Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Integration of an XML electronic dictionary with linguistic tools for natural language processing

Identifieur interne : 000144 ( Main/Exploration ); précédent : 000143; suivant : 000145

Integration of an XML electronic dictionary with linguistic tools for natural language processing

Auteurs : Octavio Santana Suarez [Espagne] ; Francisco J. Carreras Riudavets [Espagne] ; Zenon Hernandez Figueroa [Espagne] ; Antonio C. Gonzalez Cabrera [Espagne]

Source :

RBID : Pascal:08-0332344

Descripteurs français

English descriptors

Abstract

This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145000 accepted meanings.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Integration of an XML electronic dictionary with linguistic tools for natural language processing</title>
<author>
<name sortKey="Santana Suarez, Octavio" sort="Santana Suarez, Octavio" uniqKey="Santana Suarez O" first="Octavio" last="Santana Suarez">Octavio Santana Suarez</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Carreras Riudavets, Francisco J" sort="Carreras Riudavets, Francisco J" uniqKey="Carreras Riudavets F" first="Francisco J." last="Carreras Riudavets">Francisco J. Carreras Riudavets</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Figueroa, Zenon Hernandez" sort="Figueroa, Zenon Hernandez" uniqKey="Figueroa Z" first="Zenon Hernandez" last="Figueroa">Zenon Hernandez Figueroa</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Gonzalez Cabrera, Antonio C" sort="Gonzalez Cabrera, Antonio C" uniqKey="Gonzalez Cabrera A" first="Antonio C." last="Gonzalez Cabrera">Antonio C. Gonzalez Cabrera</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">08-0332344</idno>
<date when="2007">2007</date>
<idno type="stanalyst">PASCAL 08-0332344 INIST</idno>
<idno type="RBID">Pascal:08-0332344</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000030</idno>
<idno type="stanalyst">FRANCIS 08-0332344 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000031</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000021</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000020</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000020</idno>
<idno type="wicri:doubleKey">0306-4573:2007:Santana Suarez O:integration:of:an</idno>
<idno type="wicri:Area/Main/Merge">000157</idno>
<idno type="wicri:Area/Main/Curation">000144</idno>
<idno type="wicri:Area/Main/Exploration">000144</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Integration of an XML electronic dictionary with linguistic tools for natural language processing</title>
<author>
<name sortKey="Santana Suarez, Octavio" sort="Santana Suarez, Octavio" uniqKey="Santana Suarez O" first="Octavio" last="Santana Suarez">Octavio Santana Suarez</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Carreras Riudavets, Francisco J" sort="Carreras Riudavets, Francisco J" uniqKey="Carreras Riudavets F" first="Francisco J." last="Carreras Riudavets">Francisco J. Carreras Riudavets</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Figueroa, Zenon Hernandez" sort="Figueroa, Zenon Hernandez" uniqKey="Figueroa Z" first="Zenon Hernandez" last="Figueroa">Zenon Hernandez Figueroa</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Gonzalez Cabrera, Antonio C" sort="Gonzalez Cabrera, Antonio C" uniqKey="Gonzalez Cabrera A" first="Antonio C." last="Gonzalez Cabrera">Antonio C. Gonzalez Cabrera</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Department of Informática y Sistemas, Edificio de Informática y Matemdticas, Campus Universitario de Tafira, Universidad de Las Palmas de Gran Canaria</s1>
<s2>35017 Las Palmas</s2>
<s3>ESP</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Espagne</country>
<placeName>
<region nuts="2" type="communauté">Canaries</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Information processing & management</title>
<title level="j" type="abbreviated">Inf. process. manag.</title>
<idno type="ISSN">0306-4573</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Information processing & management</title>
<title level="j" type="abbreviated">Inf. process. manag.</title>
<idno type="ISSN">0306-4573</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Coding</term>
<term>Computational linguistics</term>
<term>Linguistic tool</term>
<term>Natural language processing</term>
<term>XML language</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Traitement du langage naturel</term>
<term>Linguistique mathématique</term>
<term>Outil linguistique</term>
<term>Codage</term>
<term>Langage XML</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Codage</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This study proposes the codification of lexical information in electronic dictionaries, in accordance with a generic and extendable XML scheme model, and its conjunction with linguistic tools for the processing of natural language. Our approach is different from other similar studies in that we propose XML coding of those items from a dictionary of meanings that are less related to the lexical units. Linguistic information, such as morphology, syllables, phonology, etc., will be included by means of specific linguistic tools. The use of XML as a container for the information allows the use of other XML tools for carrying out searches or for enabling presentation of the information in different resources. This model is particularly important as it combines two parallel paradigms-extendable labelling of documents and computational linguistics-and it is also applicable to other languages. We have included a comparison with the labelling proposal of printed dictionaries carried out by the Text Encoding Initiative (TEI). The proposed design has been validated with a dictionary of more than 145000 accepted meanings.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Espagne</li>
</country>
<region>
<li>Canaries</li>
</region>
</list>
<tree>
<country name="Espagne">
<region name="Canaries">
<name sortKey="Santana Suarez, Octavio" sort="Santana Suarez, Octavio" uniqKey="Santana Suarez O" first="Octavio" last="Santana Suarez">Octavio Santana Suarez</name>
</region>
<name sortKey="Carreras Riudavets, Francisco J" sort="Carreras Riudavets, Francisco J" uniqKey="Carreras Riudavets F" first="Francisco J." last="Carreras Riudavets">Francisco J. Carreras Riudavets</name>
<name sortKey="Figueroa, Zenon Hernandez" sort="Figueroa, Zenon Hernandez" uniqKey="Figueroa Z" first="Zenon Hernandez" last="Figueroa">Zenon Hernandez Figueroa</name>
<name sortKey="Gonzalez Cabrera, Antonio C" sort="Gonzalez Cabrera, Antonio C" uniqKey="Gonzalez Cabrera A" first="Antonio C." last="Gonzalez Cabrera">Antonio C. Gonzalez Cabrera</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000144 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000144 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:08-0332344
   |texte=   Integration of an XML electronic dictionary with linguistic tools for natural language processing
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024